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Abstract 

We present a new wavelet based method for the denoising of event related potentials 
(ERPs), employing techniques recently developed for the paradigm of deterministic 
chaotic systems. The denoising scheme has been constructed to be appropriate for 
short and transient time sequences using circular state space embedding. Its effec- 
tiveness was successfully tested on simulated signals as well as on ERPs recorded 
from within a human brain. The method enables the study of individual ERPs 
against strong ongoing brain electrical activity. 
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1 Introduction 



The electroencephalogram (EEG) reflects brain electrical activity owing to both 
intrinsic dynamics and responses to external stimuli. To examine pathways and 
time courses of information processing under specific conditions, several exper- 
iments have been developed controlling sensory inputs. Usually, well defined 
stimuli are repeatedly presented during experimental sessions (e.g., simple 
tones, flashes, smells, or touches). Each stimulus is assumed to induce syn- 
chronized neural activity in specific regions of the brain, occurring as potential 
changes in the EEG. These evoked potentials (EPs) often exhibit multiphasic 
peak amplitudes within the first hundred milliseconds after stimulus onset. 
They are specific for different stages of information processing, thus giving 
access to both temporal and spatial aspects of neural processes. Other classes 
of experimental setups are used to investigate higher cognitive functions. For 
example, subjects are requested to remember words, or perhaps they are asked 
to respond to specific target stimuli, e.g. by pressing a button upon their oc- 
currence. The neural activity induced by this kind of stimulation also leads 
to potential changes in the EEG. These event related potentials (ERPs) can 
extend over a few seconds, exhibiting peak amplitudes mostly later than EPs. 
Deviation of amplitudes and/or moment of occurrence (latency) from those of 
normal EPs/ERPs are often associated with dysfunction of the central nervous 
system and thus, are of high relevance for diagnostic purposes. 

As compared to the ongoing EEG, EPs and ERPs possess very low peak am- 
plitudes which, in most cases, are not recognizable by visual inspection. Thus, 
to improve their low signal-to-noise ratio, EPs/ERPs are commonly averaged 
(Figure 1), assuming synchronous, time-locked responses not correlated with 
the ongoing EEG. In practice, however, these assumptions may be inaccurate 
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and, as a result of averaging, variations of EP/ERP latencies and amplitudes 
are not accessed. In particular, short lasting alterations which may provide 
relevant information about cognitive functions are probably smoothed or even 
masked by the averaging process. Therefore, investigators are interested in 
single trial analysis, that allows extraction of reliable signal characteristics 
out of single EP/ERP sequences [1]. In ref. [2] autoregressive models (AR) 
are adopted to EEG sequences recorded prior to stimulation in order to sub- 
tract uncorrelated neural activity from ERPs. However, it is an empirical fact, 
that external stimuli lead to event-related-desynchronizaition of the ongoing 
EEG. Thus, the estimated AR-model might be incorrect. The authors of [3] 
applied autoregressive moving average (ARMA) models to time sequences 
which were a concatenation of several EP/ERP sequences. In the case of 
short signal sequences, this led to better spectral estimations than commonly 
achieved by periodograms. The main restriction is, however, that investigated 
signals must be linear and stationary, which cannot be strictly presumed for 
the EEG. In particular the high model order in comparison to the signal length 
shows that AR- and ARM A- models are often inadequate for EP/ERP analy- 
sis. Other methods have been developed to deal with the nonstationary and 
transient character of EPs/ERPs. Woody [4] introduced an iterative method 
for EP/ERP latency estimation based on common averages. He determined 
the time instant of the best correlation between a template (EP/ERP average) 
and single trials by shifting the latter in time. This method corrects a possible 
latency variability of EPs/ERPs, but its performance highly depends on the 
initial choice of templates. The Wiener filter [5,6], on the other hand, uses 
spectral estimation to reduce uncorrelated noise. This technique, however, is 
less accurate for EPs/ERPs, because the time course of transient signals is 
lost in the Fourier domain. Thus, DeWeerd [7,8] introduced a time adaptive 
Wiener filter, allowing better adjustment to signal components of short du- 
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ration. The paradigm of orthogonal wave packets {wavelet transforrr^ also 
follows this concept of adopted time-frequency decomposition. In addition, the 
wavelet transform provide several useful properties which make it preferable 
even for the analysis of transient signals [9-11]: 

• Wavelets can represent smooth functions as well as singularities. 

• The basis functions are local which makes most coefficient based algorithms 
to be naturally adapted to inhomogeneities in the function. 

• They have the unconditional basis property to represent a variety of func- 
tions implying that the wavelet basis is usually a reasonable choice even if 
very little is known about the signal. 

• Fast wavelet transform is computationally inexpensive of order 0{N), where 
N denotes the number of sample points. In contrast, fast Fourier transform 
(FFT) requires 0{Nlog{N)). 

• Nonlinear thresholding is nearly optimal for signal recovery. 

For that reasons, wavelets became a popular tool for the analysis of brain 
electrical activity [12-15], especially for denoising and classification of single 
trial EPs/ERPs. Donoho et al.[16] introduced a simple thresholding algorithm 
to reduce noise in the wavelet domain requiring no assumptions about the 
time course of signals. Nevertheless, high signal amplitudes are in need to 
distinguish between noise and signal related wavelet coefficients in single trials. 
Bertrand et al. [17] modified the original a posteriori Wiener filter to find 
accurate filter settings. The authors emphasized better adoption to transient 
signal components than can be achieved by corresponding techniques in the 
frequency domain. However, due to the averaging process, this technique runs 

^ Continuous wavelet transform: Wa,b{^ ,x{t)) = — J== x{t)"^{^^)dt 

w: wavelet coefficient, a: scaling parameter, b: translation parameter, x{t): time 
series, ^': mother wavelet function 
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the risk of choosing inadequate filter settings in the case of a high latency 
variability. The same restriction is valid for discriminant techniques applied 
e.g. by Bartink et al. [18,19]. Nevertheless, wavelet based methods enable a 
more adequate treatment of transient signals than techniques applied in the 
frequency domain. The question of accurate filter settings, however, is still an 
unresolved problem. 

To circumvent this problem, we introduce a new method for single trial anal- 
ysis of ERPs that neither assumes fully synchronized nor stationary ERP 
sequences. The method is related to techniques already developed for the 
paradigm of deterministic chaotic systems, using time delay embeddings of 
signals for state space reconstruction and denoising [20]. Schreiber and Ka- 
plan [21] demonstrated the accuracy of these methods to reduce measurement 
noise in the human electrocardiogram (ECG). Heart beats are also of transient 
character and exhibit relevant signal components in a frequency range that 
compares to ERPs. Unfortunately, ERPs are of shorter duration as compared 
to the ECG. Thus, in the case of high dimensional time delay embedding (in 
the order of the signal length), we cannot create a sufficient number of delay 
vectors for ERP sequences. To circumvent this problem we reconstruct ERPs 
in state-space using circular embeddings, that have turned out to be appro- 
priate even for signal sequences of short duration. In contrast to the nonlinear 
projection scheme described in [20], we do not use singular value decomposi- 
tion (SVD) to determine clean signals in state space. The reason for this is 
threefold. First, estimating relevant signal components using the infiexion of 
ordered eigen- values is not always apphcable to EEG because eigen- values may 
decay almost linearly. In this case, an a priori restriction to a fixed embedding 
dimension is in need, running the risk either to discard important signal com- 
ponents or to remain noise of considerable amplitude if only little is known 
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about the signal. Second, SVD stresses the direction of highest variances, so 
that transient signal components may be smoothed by projection. Third, the 
number of signal related directions in state space may alter locally, which is 
also not concerned by SVD. Instead we calculate wavelet transforms of de- 
lay vectors and determine signal related components by estimating variances 
separately for each state-space direction. Scaling properties of wavelet bases 
allow very fast calculation as well as focusing on specific frequency bands. 
To confirm the accuracy of our method, we apply it to ERP-hke test signals 
contaminated with different types of noise. Afterwards, we give an example of 
reconstructed mesial temporal lobe P300 potentials, that were recorded from 
within the hippocampal formation of a patient with focal epilepsy. 



2 Outline of the Method 



A time series may be contaminated by random noise allowing the measurement 
Un — ^n + ^n- H the mcasurcd time series is purely deterministic, it is restricted 
to a low- dimensional hyper-surface in state space. For the transient signals we 
are concerned with here, we assume this still to be valid. We hope to identify 
this direction and to correct yn by simply projecting it onto the subspace 
spanned by the clean data [22,21]. 

Technically we realize projections onto noise free subspaces as follows. Let Y = 
{yi,y2, ■ ■ ■ ,yN) denote an observed time sequence. Time-delay embedding of 
this sequence in a m-dimensional state space leads to state space vectors yn = 
{yn, ■ ■ ■ ,yn-(m-i)r), whcrc T is an appropriate time delay. In an embedding 
space of dimension m we compute the discrete wavelet transform [11,10,9] of 
all delay vectors in a small neighborhood of a vector y„ we want to correct. Let 
rn,j with j — 0, . . . ,k denote the indices of the k nearest neighbors of y„, and 
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for Yn itself, i.e. j — 0, and r„,o = "n. Thus, the first neighbor distances from 
y„ in increasing order are o?(F)W = ||y„-yr„,il| = min^/ ||y„-yr'||, d{YY^^ = 
l|yn - yr„,2ll = inin r'^rr,.i WYu " yr'||, etc, where ||y - y'|| is the Euchdean 
distance in state space. Now the important assumption is that the clean signal 
lies within a subspace of dimension d <^ m, and that this subspace is spanned 
by only a few basis functions in the wavelet domain. Let Wr„ ^ denote the fast 
wavelet transform [23,24] of y^^^. Futhermore, let C'i^''(wr„) — {^rnj)i denote 
the i*'* component of the centre of mass of w^^, and cr^ j the corresponding 
variance. In the case of neighbors owing to the signal {true neighbors), we can 
expect the ratio C^''\wr^) / a'^^^^ to be higher in signal than in noise related 
directions. Thus, a discrimination of noise and noise free components in state 
space is possible. Let 



Wn,i 

: else 



define a shrinking condition to carry out projection onto a noise free manifold 
[16]. The parameter A denotes a thresholding coefficient that depends on spe- 
cific qualities of signal and noise. Inverse fast wavelet transform of provides 
a corrected vector in state space, so that application of our projection scheme 
to all remaining delay vectors ends up with a set of corrected vectors, out of 
which the clean signal can be reconstructed. 



2. 1 Extension to multiple signals of short length 



Let Yi = {yi^i,yi^2, ■ ■ ■ ,yi,N) denote a short signal sequence that is repeatedly 
recorded during an experiment, where I = 1, . . . , L orders the number of rep- 
etitions. A typical example may be ERP recordings, where each Yi represents 
an EEG sequence following well defined stimuh. Time-delay embeddings of 
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these sequences can be written as y/,„ = {yi,n ■ ■ ■ ,yi,n-{m-i)T)- To achieve a 
sufficient number of delay vectors even for high embedding dimensions, we 
define circular embeddings by 



iyi,n, yi,l, yi,N, yi,N-(m-q)) V n < m, 



(2) 



so that all delay vectors with indices 1 < n < can be formed. Circular 
embeddings are introduced as the most attractive choice to handle the ends 
of sequences. Alternatives are (i) losing neighbors, (ii) zeropadding, and (iii) 
shrinking the embedding dimension towards the ends. However, discontinuities 
may occur at the edges, requiring some smoothing. For each Yi we define the 
smoothed sequence as 



y l,n,i 



f Q-i \2 

yi,n,ie ^ " ' 



yi,n,i 
yi,n,i(^ 



i-(N-q) -.2 



i < q 

q < i < N — q 
i> N - q 



(3) 



where q defines the window width in sample points, p the steepness of expo- 
nential damping, and i the time index. Time-delay embedding of several short 
sequences leads to a filling of the state space, so that a sufficient number of 
nearest neighbors can be found for each point. 



2.2 Parameter Selection 



Appropriate choice of parameters, in particular embedding dimension m, time 
delay r, thresholding coefficient A, as well as the number of neighbors k is 
important for accurate signal reconstruction in state space. Several methods 
have been developed to estimate "optimal" parameters, depending on specific 
aspects of the given data (e.g., noise level, type of noise, stationarity, etc.). 
These assume that the clean signal is indeed low dimensional, an assumption 



8 



we are not ready to make in the case of ERPs. Thus, we approached the 
problem of "optimal" parameters empirically. 

Parameters r and m are not independent from each other. In particular, high 
embedding dimensions allow small timc-dclays and vice versa. We estimated 
"optimal" embedding dimensions and thresholding coefficients on simulated 
data by varying m and A for a fixed r = 1. To allow fast wavelet transform, 
we chose m to be a power of 2. 

Repeated measurements, like in the case of EPs/ERPs, have a maximum num- 
ber of true neighbors which is given by kmax — L. In the case of identical signals 
this is the best choice imaginable. However, real EPs/ERPs may alter during 
experiments, and it seems more appropriate to use a maximum distance true 
neighbors are assumed to be restricted to. We define this distance by 

/2 ^'^ 

d{y)max^J^ E diy)n} (4) 
Z=l,n=l 

3 Model Data 

3.1 Generating test signals and noise 

To demonstrate the effectiveness of our denoising technique and to estimate 
accurate values for m, A, and L, wc applied it to EP/ERP-like test signals con- 
taminated with white noise and in-band noise. The latter was generated using 
phase randomized surrogates of the original signal [25] . Test signals consisted 
of 256 sample points and were a concatenation of several Gaussian functions 
with different standard deviations and amplitudes. To simulate EPs/ERPs 
not fully synchronized with stimulus onset, test signals were shifted randomly 
in time (normally deviated, std. dev.: 20 sample points, max. shift: 40 sample 
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points). Since even fast components of the test signal extended over several 
sample points, a minimum embedding dimensions m = 16 was required to 
cover any significant fraction of the signal. The highest embedding dimension 
was bounded by the length of signal sequences and the number of embedded 
trials, thus allowing a maximum of m = 256. However, if the embedding di- 
mension ism = N, neighborhood is not longer defined by local characteristics, 
and we can expect denoised signals to be smoothed in the case of multiple time 
varying components. 

3.2 Denoising of test signals 

Let Xi — {xi^i,xi^2, ■ ■ ■ ,xi^n) denote the l^^ signal sequence of a repeated 
measurement, Yi = {yi,i,yi,2, ■ ■ ■ ,yi,N) the noise contaminated sequence, and 
— iyi,i, yi,2, ■ ■ ■ , yi,N) the corresponding result of denoising. Then 



defines the noise reduction factor which quantifies signal improvement owing 
to the filter process. 

We determined r for test signals contaminated with white noise, using noise 
amplitudes ranging from 25% - 150%, and embedding dimensions ranging from 
16 - 128 (Figure 2a, Figure 3). Five repetitions for each parameter configu- 
ration were calculated using 5 embedded trials each. In the case of A < 2, 
the noise reduction factor was quite stable against changes of noise levels but 
depended on embedding dimension m and thresholding coefficient A. Best per- 
formance was achieved for 1.0 < A < 2.0 (r^aJ^^'^^^'° ~ '^■^)■ In the case of 
A > 4.0, most signal components were rejected, and as a result, the noise re- 
duction factor r increased linearly with noise levels, as expected. Figure 2b and 



r = — 





(5) 
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Figure 4 depict effects of denoising of 5 test signals contaminated with in-band 
noise. In comparison to white noise the performance decreased, but neverthe- 
less, enabled satisfactory denoising for 0.5 < A < 1.0 (i"mai^^''^^^'° — l-^)- 
Within this range, the noise reduction factor r depended weakly on noise lev- 
els. Note that the embedding dimension must be sufficiently high (m = 128) 
to find true neighbors. 

In order to simulate EPs/ERPs with several time- varying components, we used 
5 test signals which were again a concatenation of different Gaussian functions, 
each, however, randomly shifted in time (Figure 2c and Figure 5). In contrast 
to test signals with time fixed components, "optimal" embedding dimension 
depended on the thresholding coefficient A. Higher values of A required lower 
embedding dimensions and vice versa. Best results were achieved for 0.5 < 
A<2.0 (C=^^«'^=^-° = 3.2). 

Even for high noise levels, the proposed denoising scheme preserved finer struc- 
tures of original test signals in all simulations. Moreover, the reconstructed 
sequences were closer to the test signals than the corresponding averages, es- 
pecially for time varying signals. Power spectra showed that denoising took 
part in all frequency bands and was quite different from common low-, or 
band-pass filtering. Simulation indicated that "optimal" values of the thresh- 
olding coefficient were in the range 0.5 < A < 2.0. Best embedding dimension 
was found to be m = 128, since the ongoing background EEG can be assumed 
to be in-band with ERPs. The filter performance was quite stable against the 
number of embedded sequences, at least for L — 5, 10, 20. 

4 Real data 
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4-1 Data Acquisition 



We analyzed event related potentials recorded intracerebrally in patients with 
pharmacoresistent focal epilepsy [26]. Electroencephalographic signals were 
recorded from bilateral electrodes implanted along the longitudinal axis of 
the hippocampus. Each electrode carried 10 cylindrical contacts of nickel- 
chromium alloy with a length of 2.5 mm and an intercontact distance of 4 
mm. Signals were referenced to linked mastoids, amplified with a bandpass 
filter setting of 0.05 - 85.00 Hz (12dB/oct.) and, after 12 bit A/D conver- 
sion, continuously written to a hard disk using a sampling interval of 5760//S. 
Stimulus related epochs spanning 1480 ms (256 sample points) including a 
200 ms pre-stimulus baseline were extracted from recorded data. The mean of 
the pre-stimulus baseline was used to correct possible amplitude shifts of the 
following ERP epoch. 

In a visual odd-ball paradigm 60 rare (letter < x >, targets) and 240 frequent 
stimuli (letter < o >, distractors) were randomly presented on a computer 
monitor once every 1200 ± 200ms (duration: 100 ms, probability of occur- 
rence: l(<x>):5(<o>)). Patients were asked to press a button upon 
each rare target stimulus. This pseudo-random presentation of rare stimuli 
in combination with the required response is known to elicit the mesial tem- 
poral lobe (MTL) P300 potential in recordings from within the hippocampal 
formation [27] (cf. Figure 1). 

4-2 Results 

By simulation, we estimated a range in which "optimal" parameters of the 
filter can be expected. However, the quality of denoising ERP sequences could 
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not be estimated, because the clean signal was not known a priori. A rough 
estimation of filter performance was only possible by a comparison to ERP 
averages. Taking into account results of simulation as well as ERP averages, 
we estimated A = 0.6 and m — 128 to be the best configuration. 

Based on the empirical fact that specific ERP components exhibit peak am- 
plitudes within a narrow time range related to stimulus onset, we defined a 
maximum allowed time jitter of ±20 sample points (~ 116ms) true neighbors 
are assumed to be restricted to. This accelerated the calculation time and 
avoided false nearest neighbors. Figure 6 depicts several ERPs recorded from 
different electrode contacts within the hippocampal formation. The number 
of embedded sequences was chosen as L = 8. Comparing averages, we can ex- 
pect that the filter extracted the most relevant MTL-P300 components. Even 
for low amplitude signals reconstruction was possible, exhibiting higher am- 
plitudes in single trial data than in averages. As corresponding power spectra 
show, the 50 Hz power line was reduced but not eliminated after filtering. 
Especially low amplitude signals showed artifacts based on the 50 Hz power 
line. 

5 Conclusion 

In this study, we introduced a new wavelet based method for nonlinear noise 
reduction of single trial EPs/ERPs. We employed advantages of methods de- 
veloped for the paradigm of deterministic chaotic systems, that allowed de- 
noising of short and time variant EP/ERP sequences without assuming fully 
synchronized or stationary EEC 

Denoising via wavelet shrinkage does not require a priori assumptions about 
constrained dimensions, as is usually required for other techniques (e.g., singu- 



la 



lar value decomposition). Besides, it is more straight forward using thresholds 
depending on means and variances rather than initial assumptions about con- 
strained embedding dimensions. Moreover, the local calculation of thresholds 
in state space enables focusing on specific frequency scales, which may be ad- 
vantageous in order to extract signal components located within both narrow 
frequency bands and narrow time windows. 

Extension of our denoising scheme to other types of signals seems to be possi- 
ble, however, demands further investigations, since "optimal" filter parameters 
highly depend on signal characteristics. In addition, the noise reduction factor 
r docs not consider all imaginable features of signals investigators are possibly 
interested in, so that other measures may be more advantageous in specific 
cases. 

So far, we have not considered effects of smoothing the edges of signal se- 
quences. But since delay vectors as well as corresponding wavelet coefficients 
hold information locally, we can assume artifacts to be also constrained to the 
edges which we were not interested in. 

In conclusion, the proposed denoising scheme represents a powerful noise re- 
duction technique for transient signals of short duration, like ERPs. 
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Figure captions: 



Fig. 1: Examples of averaged ERPs recorded along the longitudinal axis of the 
hippocampal formation in a patient with epilepsy. Randomized presentation of 
target and standard stimuli is known to elicit the mesial temporal lobe P300, a 
negative deflection peaking at about 500 ms after stimulus onset (cf. Sect. 4.1 
for more details). Letters (a), (b), and (c) indicate recordings used for single 
trial analysis (cf. Figure 6). 

Fig. 2: Results of denoising test signals. Parts a) and b): contamination with 
white noise and in-band noise. Part c): time varying signal components and 
white noise contamination (see text for more details). Five calculations for each 
parameter configuration have been executed to determine standard deviations. 

Fig. 3: Nonlinear denoising applied to white noise contaminated test signals (5 
sequences embedded, each 256 sample points, randomly shifted in time (std. 
dev.: 20 sample points, max. shift: 40 sample points), noise amplitude 75%, 
m — 128, r = 1, A = 1.5). Power spectra in arbitrary units. For state space 
plots we used a time delay of 25 sample points. 

Fig. 4: Same as Figure 3 but for in-band noise and A = 0.75. 

Fig. 5: Same as Figure 3 but for Gaussian functions each randomly shifted in 
time and A = 0.75. 

Fig. 6: Examples of denoised MTL-P300 potentials (cf. Figure 1). Power spec- 
tra in arbitrary units. For state space plots we used a time delay of 25 sample 
points. 
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